Improving the Front-End Noise Preprocessor of MELPe
نویسندگان
چکیده
In this paper we focus on improving the noise preprocessor (NPP) of the low-rate speech coder MELPe using information from the non-acoustic General Electromagnetic Motion Sensor (GEMS). A generalized linear model approach is proposed to improve the voice activity estimation both in the frame-level time domain and in the bin-level frequency domain with GEMS and context features. HMM based speech recognition techniques are also investigated to drive the estimators. The improved voice activity parameter estimators are shown to have significantly less error than the estimates from MELPe NPP. The improved frame-level voice activity estimator achieves 66% reduction in error. The improved bin-level voice activity estimates has more than 50% error reduction. With an optimal spectral amplitude estimation algorithm instead of the MM-LSA algorithm used in MELPe NPP, and the improved voice activity parameters, the processed noisy speech has much less residue noise and higher intelligibility in informal listening tests.
منابع مشابه
RFC 8130 RTP Payload Format for MELPe Codec March 2017
This document describes the RTP payload format for the Mixed Excitation Linear Prediction Enhanced (MELPe) speech coder. MELPe’s three different speech encoding rates and sample frame sizes are supported. Comfort noise procedures and packet loss concealment are described in detail.
متن کاملDual-microphone Robust Front-end for Arm’s-length Speech Recognition
This paper describes a novel method of improving the performance of a speech recognition front-end in non-stationary background noise. A two-microphone array has been designed that both enhances the speech and provides a continuous estimate of the background noise. This processing has been integrated with the standard ETSI DSR Advanced Front End so that the continuous noise estimate is an input...
متن کاملImproving the noise and spectral robustness of an isolated-word recognizer using an auditory-model front end
In this study, the performance of an auditory-model featureextraction “front end” was assessed in an isolated-word speech recognition task using a common hidden Markov model (HMM) “back end”, and compared with the performance of other feature representation front-end methods including mel-frequency cepstral coefficients (MFCC) and two variants (Jand L-) of the relative spectral amplitude (RASTA...
متن کاملA psychoacoustical model of the auditory periphery as front end for ASR
The application of a psychoacoustical model of the auditory periphery in the field of automatic speech recognition (ASR) is presented. The model was developed to quantitatively predict human performance in typical spectral and temporal masking experiments. Speaker-independent, isolated-digit recognition experiments in different types of noise were carried out to evaluate the robustness of the a...
متن کاملIs speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
Using deep neural networks (DNNs) for automatic speech recognition (ASR) has recently attracted much attention due to the large performance improvement they provide for a variety of tasks. DNNs are known to be robust to overfitting and to be able to remove speaker variability. Another important cause of variability in speech is the presence of noise. A lot of research has been undertaken on noi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004